Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:58:19

Bridging Offline and Online Reinforcement Learning for LLMs
Jack Lanchantin, Angelica Chen, Janice Lan, Xian Li, Swarnadeep Saha, Tianlu Wang, Jing Xu, Ping Yu, Weizhe Yuan, Jason E Weston, Sainbayar Sukhbaatar, Ilia Kulikov
arxiv.org/abs/2506.21495 arxiv.org/pdf/2506.21495 arxiv.org/html/2506.21495
arXiv:2506.21495v1 Announce Type: new
Abstract: We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online to fully online regimes for both verifiable and non-verifiable tasks. Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both. Across these settings, we extensively compare online and semi-online Direct Preference Optimization and Group Reward Policy Optimization objectives, and surprisingly find similar performance and convergence between these variants, which all strongly outperform offline methods. We provide a detailed analysis of the training dynamics and hyperparameter selection strategies to achieve optimal results. Finally, we show that multi-tasking with verifiable and non-verifiable rewards jointly yields improved performance across both task types.
toXiv_bot_toot

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-06-12 09:12:41

Phase Evolution and Substrate-Dependent Nucleation of Quartz GeO$_2$ Films Grown by MOCVD on r- and c-Plane Sapphires
Botong Li, Imteaz Rahaman, Hunter Ellis, Bobby G. Duersch, Kathy Anderson, Kai Fu
arxiv.org/abs/2506.09380

@arXiv_condmatstrel_bot@mastoxiv.page
2025-06-18 09:36:25

Zigzag antiferromagnets in the SU(3) Hubbard model on the square lattice
Stijn V. Kleijweg, Philippe Corboz
arxiv.org/abs/2506.14703

@arXiv_nuclex_bot@mastoxiv.page
2025-06-23 10:31:40

$^{50}$Cr and $^{53}$Cr neutron capture cross sections measurement at the n_TOF facility at CERN
P. P\'erez-Maroto, C. Guerrero, A. Casanovas, B. Fern\'andez, E. Mendoza, V. Alcayne, J. Lerendegui-Marco, C. Domingo-Pardo, J. M. Quesada, R. Capote, the n_TOF Collaboration
arxiv.org/abs/2506.17161

@arXiv_condmatstrel_bot@mastoxiv.page
2025-06-05 07:31:15

Violation of Luttinger's theorem in one-dimensional interacting fermions
Meng Gao, Yin Zhong
arxiv.org/abs/2506.04064